Improving diffuse optical tomography imaging quality using APU-Net: an attention-based physical U-Net model

Abstract. Significance Traditional diffuse optical tomography (DOT) reconstructions are hampered by image artifacts arising from factors such as DOT sources being closer to shallow lesions, poor optode-tissue coupling, tissue heterogeneity, and large high-contrast lesions lacking information in deeper regions (known as shadowing effect). Addressing these challenges is crucial for improving the quality of DOT images and obtaining robust lesion diagnosis. Aim We address the limitations of current DOT imaging reconstruction by introducing an attention-based U-Net (APU-Net) model to enhance the image quality of DOT reconstruction, ultimately improving lesion diagnostic accuracy. Approach We designed an APU-Net model incorporating a contextual transformer attention module to enhance DOT reconstruction. The model was trained on simulation and phantom data, focusing on challenges such as artifact-induced distortions and lesion-shadowing effects. The model was then evaluated by the clinical data. Results Transitioning from simulation and phantom data to clinical patients’ data, our APU-Net model effectively reduced artifacts with an average artifact contrast decrease of 26.83% and improved image quality. In addition, statistical analyses revealed significant contrast improvements in depth profile with an average contrast increase of 20.28% and 45.31% for the second and third target layers, respectively. These results highlighted the efficacy of our approach in breast cancer diagnosis. Conclusions The APU-Net model improves the image quality of DOT reconstruction by reducing DOT image artifacts and improving the target depth profile.


Introduction
In the United States, breast cancer is the most diagnosed and the second deadliest cancer.1][12][13] MRI, on the other hand, is constrained by the requirement for contrast agent injection, and the diagnostic utility of US for solid masses is also limited.
To address these limitations, our group developed a portable frequency domain US-guided diffuse optical tomography (DOT) system. 15,16DOT utilizes scattered near-infrared light to reconstruct the distributions of optical absorption coefficients at selected wavelengths and map the hemoglobin concentration of the biological tissue. 17,180][21][22][23][24] By incorporating DOT into breast cancer diagnosis, we can potentially improve the accuracy of breast cancer detection and reduce the need for unnecessary biopsies, ultimately improving patient outcomes.
DOT has been proven to downgrade the biopsy recommendation of Breast Imaging Reporting and Data System assessments by 23.5% for benign lesions, 25 implying a huge potential for reducing high false positives in US-based diagnoses.However, as illustrated in Fig. 1, the quality of DOT images is degraded by many problems: source artifacts when imaging shallow lesions, artifacts caused by poor optode-tissue coupling, a mismatch between the reference and target sides, tissue heterogeneity, and lesion posterior shadowing.The reconstruction of shallow targets, which are located close to the probe-tissue interface, tends to include sources on the probe itself that impede getting a correct hemoglobin reading from the images.DOT reconstructions are also sensitive to measurement errors when the probe is in poor contact with the tissue and tissue heterogeneity.The former problem causes hot spots on the non-lesion regions, for example, at the edge of the reconstructed DOT image, while the latter issue introduces multiple target-like objects.Besides, a large, high-absorption lesion absorbs more light from the top layer in depth and causes fewer photons to penetrate deeper layers; thus, the reconstructed absorption profile loses the details in the deeper region.Therefore, improving the quality of DOT reconstruction is essential to downstream tasks, including diagnosing malignant and benign lesions.Recently, many research groups have proposed various deep learning-based methods for DOT reconstruction and quality improvement.Zhao et al. 26 introduced an unroll-DOT framework, utilizing a refined U-Net to enhance DOT images following an unroll-network process.Ko et al. 27 integrated deep neural networks with conventional DOT reconstruction methods, resulting in a notable enhancement in image quality compared with traditional approaches.Yedder et al. 28 presented a multitask deep learning framework for reconstruction and lesion localization in limited-angle DOT.They used physics-based simulations to create synthetic datasets and applied a transfer learning approach to bridge the sensor domain gap between in silico and real-world data, yielding promising results in a clinical example.Deng et al. 24 developed the FDU-Net, which consists of a fully connected subnet, a convolutional encoder-decoder subnet, and a U-Net, for three-dimensional DOT reconstruction, also demonstrating favorable outcomes in one clinical case.However, these approaches have limitations.Many of them lack extensive validation with clinical datasets or have been tested on only one or two patient cases.The challenges inherent in DOT reconstruction, compounded by image resolution limitations, hinder the widespread adoption of deep learning techniques in DOT image enhancement.Besides, the absence of ground truth in clinical images introduces a significant barrier to supervised learning due to the domain shift issues between the simulated and clinical data.
Other than the artifacts of DOT images, one primary challenge in reconstruction is the impact of target size and depth, which can adversely affect the accuracy of the reconstructed absorption maps.Specifically, smaller and deeper targets are often under-reconstructed, which means the reconstructed lesion suffers from a lower absorption coefficient, limiting the diagnostic accuracy.
U-Net has emerged as a pivotal architecture in image enhancement due to its unique ability to preserve spatial information while effectively capturing contextual features.U-Net-based models have demonstrated versatility and effectiveness in image reconstruction applications, [29][30][31][32] such as those by Chen et al. 30 and Chowdary and Yin. 31 The skip connections in U-Net enable the precise reconstruction of images, making it particularly useful for tasks where accurate localization and reconstruction are required.Given these strengths, our use of a U-Netbased model is aimed at achieving high-fidelity reconstructions with improved accuracy, building on its proven track record in image processing tasks.This design helps address the challenges posed by variable target sizes and depths, leading to improved accuracy.
However, U-Net's performance may be affected by the low resolution of functional DOT inputs.Thus, we introduced the contextual transformer (CoT) attention module 33 into the U-Net to obtain a more target-focused and highly generalizable deep learning model.The attention module, a pivotal component in modern neural network architectures, facilitates the dynamic weighting of input features, allowing the model to selectively focus on relevant information.By applying the attention module, the model focused more on the relationship between the adjacent depth layers of the DOT reconstruction and the artifacts around the target in the image.
In this study, we present a novel deep learning framework with an attention module to enhance the contrast and remove the artifacts in DOT reconstruction.The attention-based U-Net (APU-Net) model takes the reconstructed DOT image and the corresponding fine mesh information as the input, predicting the measurement in the bottleneck as the forward model, and then outputs the enhanced DOT images as the solver of the inverse problem for DOT reconstruction.
After training exclusively on simulation and phantom data, the model demonstrated commendable efficacy in enhancing depth contrast and eliminating artifacts in clinical DOT images.To our knowledge, this is the first application of a deep learning model to enhance DOT reconstructions with a large patient dataset, with potential applicability to other DOT systems.
reference signal to generate 20-kHz signals.Following this, the output of the mixer for each channel underwent amplification and filtering at 20 kHz before being sampled by a 16-channel analog-to-digital converter.
For real-time US image guidance, we positioned a commercial linear US probe at the center of the DOT probe to obtain US and DOT measurements of the targeted lesion beneath, as detailed in Ref. 34.The system is depicted in Fig. 2. The probe position was fixed during DOT data acquisition, with a data acquisition time of ∼3 to 4 s for each data set.US recording was paused during DOT data acquisition and resumed once the DOT data collection was complete.

Simulation and Phantom Configuration
To mimic the clinical dataset as much as possible and make our model more generalizable to the clinical dataset, we employed the finite element method (FEM) in COMSOL software and conducted Monte Carlo (MC) simulations using the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) 35 breast phantom to generate forward measurements.The FEM simulation can replicate the DOT reconstruction with artifacts in shallow targets.Meanwhile, the MC simulation sought to reproduce DOT reconstruction with artifacts related to tissue heterogeneity by varying the fat fraction within the digital breast phantom.
For the FEM approach, we approximated complex clinical scenarios as a single-target breast-shaped model.In the setting of geometry, a hemisphere with a radius of 10 cm was employed.Various inclusions, such as a ball, ellipsoid, cube, a ball with two combined hemispheres, a star, and letters, were considered with varying absorption coefficients to improve the complexity of the dataset.In the simulation, we positioned nine sources and 14 detectors based on the geometry of our clinical DOT probe.Further details of this setting can be found in Ref. 36.
In the MC approach, the digital phantom generated by VICTRE, with a radius of 7 cm and height of 5 cm, served as a heterogeneous condition by incorporating various tissue types.Simulations were performed by translating and rotating the probe on the phantom, with fat fractions varying from 20% to 80%.Additional details of the MC simulation can be found in Ref. 37.
For the phantom study, we utilized high-contrast targets made of calibrated polyester resin (μ a ¼ 0.23 cm −1 ) and low-contrast targets made of silicon (μ a ¼ 0.11 cm −1 ).The targets were immersed in an intralipid solution (μ a ¼ 0.02 cm −1 , μ 0 s ¼ 6 − 8 cm −1 ) and placed over a silicon plate whose μ a was similar to that of the solution.We recorded the co-registered US images and the DOT measurements with different targets centralized underneath the probe.
In this study, we utilized 10,975 sets of simulation data and 360 sets of phantom data.Further dataset details are provided in Table 1.

DOT Patient Data
Our US-guided DOT system has been utilized in clinical studies with protocols that received approval from the appropriate Institutional Review Boards and complied with the Health Insurance Portability and Accountability Act. 38,39All participants were fully informed about the study's purpose, procedures, and potential risks before signing a written consent form.To maintain patient confidentiality, all data used in this study were de-identified.Table 2 lists the details of the histologic group, age, histologic diagnosis, and information on shallow cases.

Conjugate Gradient Descent (CGD) Reconstruction
We used the CGD reconstruction as the input for our model.Details about this reconstruction method can be found in Ref. 40.In summary, we modeled photon migration using a diffusion equation for the photon density wave and applied the Born approximation to relate the scattered field (U sc ) to the changes in absorption coefficients (δμ a ), as follows: where W is the weight matrix derived from the diffusion equation for a semi-infinite medium.
The variable m represents the number of measurements, and n represents the number of voxels.
To solve for δμ a , we formulated the inverse problem as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 7 ; 1 5 3 where k • k is the Euclidean norm, δμ 0 a is the initial estimate of the optical properties, and λ is the regularization parameter.This formulation allows us to estimate the changes in absorption coefficients while balancing the data-fitting term and the regularization term, which contributes to the stability and accuracy of the reconstruction.The spatial grid used for reconstruction measures 9 cm × 9 cm × 3.5 cm.This grid is divided into a fine-mesh grid centered at the lesion location and a coarse-mesh grid for the background.The resolutions for the fine and coarse meshes are 0.25 cm × 0.25 cm × 0.5 cm and 1.5 cm × 1.5 cm × 0.5 cm, respectively.Lesions typically occupy one to three depth layers of our seven-layer reconstruction, indicating that most lesions have a vertical diameter of 0.5 to 2 cm.

Overall Structure
The overall network structure, depicted in Fig. 3, comprises two main components: an encoder, which solves the forward diffusion equation, and a decoder, which addresses the inverse problem, mapping perturbations to spatial absorption distributions.
In the encoder, the reconstruction DOT images and their corresponding fine-mesh regions serve as inputs, where the fine mesh delineates a refined grid in the spatial domain.By preserving the background as a coarser mesh grid, computational resources are concentrated on specified areas, enhancing the accuracy of the result.The concatenation of the reconstruction and fine mesh is then fed into an attention-convolution block, comprising a convolution layer followed by a CoT module, as elaborated in Sec.3.3.Subsequently, a pooling layer compresses the features to a lower dimension.This module repeats four times, ultimately reshaping the output into a one-dimensional array.
In the bottleneck, multiple fully connected layers map the one-dimensional features to the perturbation, addressing the forward problem.The mean square error (MSE) loss between the bottleneck output and the perturbation is calculated as part of the final loss equation.The perturbation undergoes additional fully connected layers and is reshaped back to a three-dimensional form.
In the decoder, we begin by concatenating features from the encoder side as a skip connection, strategically employed to prevent overfitting.Then, four attention-conv blocks are utilized to decode these features into the reconstruction.We reinforce the spatial distribution emphasis by again concatenating the fine mesh with the features.Conclusively, two additional convolution layers are applied to acquire the enhanced reconstruction, solidifying the decoder's role as the solver for the inverse problem in the diffusion equation.

CoT Attention Block
A DOT reconstruction, being a functional image, often lacks lesion detail due to low resolution, posing challenges for traditional deep learning models to accurately recognize targets and achieve satisfactory performance.To address this issue and improve the model's focus on target areas within the DOT image, we introduced the CoT block as a self-attention module.The CoT module, designed to aggregate contextual information among input keys for guiding the learning of a dynamic attention matrix, demonstrates substantial potential in visual recognition.By linking neighboring information in keys to queries, it enables adaptive focus on the lesion and surrounding areas.This addition aids the model in understanding features more effectively and assigning attention to relevant areas.
The CoT module's structure, as outlined in Fig. 4(a), involves transforming a 3D feature map X (with dimensions H × W × C) into keys K ¼ X, queries Q ¼ X, and values V ¼ XW v , where W v is the embedding matrix, achieved through a 1 × 1 convolution.Departing from the traditional method, which uses 1 × 1 convolution to encode the keys, the CoT module utilizes k × k convolution over the neighbor keys within the k × k spatial grid.The learned keys, denoted as the mined static context K 1 , take the shape of H × W × C. The module then embeds attention by concatenating K 1 and the Q then processing them with attention embedding A e which includes two consecutive 1 × 1 convolutions with a rectified linear unit activation after the first convolution: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 7 ; 3 3 1 Here, unlike the isolated pairwise convolution, the CoT module combines the query with the key and the surrounding area at each location.Subsequently, the dynamic contextual representation of inputs K 2 is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 7 ; 2 7 0 where * stands for element-wise multiplication.Finally, the CoT module returns the fusion of the static context K 1 and the dynamic context K 2 as the refined feature map X F as 33 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 7 ; 2 2 3

Loss Function and Training Schemes
To refine the model's focus on lesions within the image, we employed a weighted MSE loss L i for reconstruction, expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 7 ; 1 5 1 where θ represents the APU-Net, δμ a and m f denote the reconstructed DOT images and corresponding fine mesh, respectively, and δμ ða;gÞ signifies the ground truth.Finally, a and b represent the weights in the weighted MSE loss for the target area and background, respectively.L i adjusts weights to prioritize lesion areas, augmenting inclusion weights while reducing background weights.In addition, to measure the semantic difference between the enhanced reconstruction and the ground truth, we utilized a pre-trained VGG16 41 model to extract feature domain loss L f as the perceptron loss 42 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 4 ; 7 0 0 L f ðwÞ ¼ kF VGG ðθðwjδμ a ; m f ÞÞ − F VGG ðδμ ða;gÞ Þk 2 ; (7) where F VGG signifies the features of the pre-trained VGG16.
The MSE loss L p between the bottleneck output and the perturbation, as mentioned in Sec.3.1, is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 4 ; 6 3 8 where w Ã and θ Ã represent the encoder portion up to the perturbation in APU-Net and U sc represents the perturbation.
In accordance with Sec.2.2, we initially trained the model exclusively on multiple-target simulations, employing a learning rate of 0.0001 over 200 epochs.To manage the learning rate decay, a threshold of 0.01 was established, triggering adjustment if substantial loss drops were not observed within a specified epoch range.Subsequently, we conducted fine-tuning using single-target simulations and phantom data to reflect typical clinic scenarios.This phase employed a learning rate of 0.00005, consistent with the previous weight decay, and spanned 200 epochs.Following the training and fine-tuning, we determined the coefficients for loss calculation to be α ¼ 5, β ¼ 1, γ ¼ 0.01, a ¼ 0.98, and b ¼ 0.02, ensuring a balanced consideration of different components within the loss function from the grid search.
In addition, we applied various data augmentation techniques.These included adding random Gaussian noise to the original DOT reconstruction, rotating images randomly within a range of −45 to 45 deg, and applying random affine transformations.The affine transformations encompassed random variations in the rotation, translation, and scaling.

Evaluation Metrics
We assessed the performance of our model by measuring its effectiveness in removing artifacts and improving target contrast in different depth layers.To evaluate artifact contrast, we introduced the metric C arti , defined as the ratio of the maximum hemoglobin concentration within the artifact region to the maximum hemoglobin concentration within the lesion region E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 4 ; 3 1 3 A lower value of C arti indicates better artifact removal.
For depth contrast, we calculated the hemoglobin contrast among different depth layers.Specifically, we defined C 12 as the ratio of the maximum hemoglobin concentrations between the second and first depth layers, and C 13 as the ratio between the third and first layers.Unlike the artifact contrast, for C 12 and C 13 , a value closer to one suggests that the hemoglobin contrast is consistent across layers, which is desirable.

Test Results on Simulation Dataset
We first evaluated the model's performance using a separate simulation dataset.Figure 5 presents a boxplot comparing the input, output, and ground truth results, providing a detailed overview of the performance.In this dataset, which includes 1647 simulated cases, our APU-net successfully improved the depth contrast for both layers.
For the depth contrast between the first and second layers (C 12 ), our model increased the contrast from an initial average of 0.9926 ðAE0.1048Þ to 0.9989 ðAE0.0353Þ.Similarly, for the depth contrast between the first and third depth layers (C 13 ), the model improved the contrast from 0.9836 ðAE0.1004Þ to 0.9997 ðAE0.0335Þ.More importantly, the APU-Net significantly reduced the variance of the contrast.

Artifact Removal on Clinical Dataset
We then assess the model's generalization to clinical datasets.We leverage examples of clinical DOT hemoglobin maps to showcase the model's efficacy in artifact removal.Notably, the hemoglobin maps are derived from the absorption distribution at four wavelengths. 34    In Fig. 6(a), we observe a malignant case at a shallow depth of 0.6 cm, where the lesion's upper portion is obscured by sources near the probe surface.The model's output on the right presents a more refined target at the center, with all source artifacts effectively removed.Figure 6(b) illustrates another malignant case, characterized by shadow effects stemming from intense absorption in the top layer.Here, our APU-Net model successfully restores the target's absorption, aligning closely with the first layer's shape and value, indicative of a favorable depth profile.Figure 6(c) showcases a benign case with artifacts caused by the heterogeneous background tissue.Here, the model adeptly identifies the lesion at the center while eliminating surrounding artifact-like regions, albeit exhibiting a lower maximum hemoglobin.Next, we compare the artifact contrasts of the original reconstructions and the outputs from the model.These examples underscore the model's seamless transition from simulation and phantom studies to clinical scenarios, demonstrating its robust performance.Subsequently, we provide further statistical analysis of the model's efficacy on clinical data.

Statistics on Clinical Dataset
To elucidate the model's efficacy in enhancing depth profiles within clinical settings, we conducted a comprehensive analysis based on data collected from 83 patients.Among these patients, 45 had DOT reconstructions with more than two layers, while 20 had DOT reconstructions with three layers.In Fig. 8, we depict the maximum hemoglobin contrast as introduced in Sec.3.3.After excluding six cases in the second-layer calculation and three cases in the third-layer calculation due to exceptionally low hemoglobin levels, our APU-Net model demonstrates notable improvement in contrast for deeper layers.Specifically, the contrast for the second layer C 12 increased from 0.7273 ðAE0.1650Þ to 1.0688 ðAE0.1379Þ, while for the third layer C 13 , it improved from 0.3811 ðAE0.1941Þ to 1.0611 ðAE0.1045Þ.
We then examined the depth contrast, separated into benign and malignant groups, as shown in Fig. 9.We observed significant differences in all subgroups (second-layer benign, secondlayer malignant, and third-layer malignant), except for the third-layer contrast in the benign group, where only one patient's data were available, making statistical analysis difficult.In the second layer, the depth contrast improved from 0.7634 ðAE0.1225Þ to 1.0724 ðAE0.1972Þ for the benign group and from 0.7020 ðAE0.1881Þ to 1.0663 ðAE0.0800Þ for the malignant group.For the third-layer contrast in the malignant group, the APU-Net increased the contrast from 0.3892 ðAE0.1987Þ to 1.0665 ðAE0.1063Þ,indicating a significant improvement.
We computed the maximum hemoglobin values, categorized by benign and malignant cases.For benign cases, the average output value is 64.24AE 18.83 μM, which is lower than the input average value of 69.84 AE 12.56 μM.For the malignant cases, the averaged outputs are 82.70AE 20.67 μM, which is higher than the input hemoglobin of 75.73 AE 21.57μM, with a similar variance.There is no statistical significance between the input and output of the benign and malignant subgroups, respectively, which is expected since the goal of the study is to reduce image artifacts and improve the lesion depth profile.Furthermore, our analysis revealed that our APU-Net model enhanced the differentiation between benign and malignant groups, as illustrated in Fig. 10.This finding underscores the potential of our model to enhance diagnostic accuracy in future studies.

Ablation Study
Ablation studies were conducted to evaluate the impact of key design components in APU-Net on synthesis performance.We focused on the effectiveness of the attention module, by removing it to measure the enhancements facilitated by the attention blocks.To assess the impact of employing attention modules in the neural net, we evaluated the artifact contrast along with the depth profiles of the second and third layers in the DOT reconstruction.
Figure 11 presents the results for artifact contrast.We observed a significant improvement when using APU-Net with the attention module than without it.APU-Net achieved an artifact contrast of 0.5580 ðAE0.1678Þ,whereas the configuration without the attention module had an artifact contrast of 0.6530 ðAE0.1964Þ.

Discussion and Conclusion
In this paper, we introduced an APU-Net model designed to enhance the quality of DOT reconstructions, effectively mitigating artifacts, improving depth profiles, and improving contrast in

Fig. 1
Fig. 1 Degraded DOT reconstructions.By rows, from left to right: high-quality reconstruction, reconstruction of shallow targets with source distribution on top, reconstruction with optode coupling issue, reconstruction with multiple objects due to tissue heterogeneity, and the shadow effect caused by the absorption of the top layer.

Fig. 2
Fig. 2 Sketch of the DOT system.The probe is placed on the compressed breast.

Fig. 3
Fig.3Structure of the APU-Net model.The encoder, up to the perturbation, functions as the solver for the photon diffusion equation, while the remaining neural network components serve as the solver for the inverse problem.

Fig. 4
Fig. 4 (a) Structure of the attention-conv block, where the "×" and "þ" blocks denote the elementwise multiplication and addition operations, respectively.(b) Detailed architecture of the attention embedding module.
Figure 6 presents three instances of low-quality reconstructions, accompanied by US images highlighting the targets on the left side, which include several dashed orange lines indicating the different depth

Fig. 5
Fig. 5 Comparative depth contrasts among the input reconstruction, the output of the model, and the ground truth.

Fig. 6
Fig. 6 Examples of low-quality DOT reconstructions and corresponding corrected DOT images from the clinical dataset, with blue ovals indicating the locations of lesions.(a) A shallow malignancy with a source pattern obscuring the lesion's top.(b) A deeper malignancy with shadow effects.(c) A benign lesion with artifacts attributed to tissue heterogeneity.
layers of the lesion, along with one line representing the upper depth layer and one representing the lower depth layer.

Figure 7 (
a), a boxplot of the inputs and outputs, shows the statistical details of the images.Based on 34 patients with artifacts caused by shallow targets or tissue heterogeneity, the model reduced the artifact contrast from 0.8448 ðAE0.2888Þ to 0.5580 ðAE0.1678Þ.Figures 7(b) and 7(c) break down the artifact contrast by benign and malignant groups.For the benign group, the APU-Net decreased the artifact contrast from 0.8691 ðAE0.2901Þ to 0.5787 ðAE0.1755Þ.In the malignant group, the artifact contrast was reduced from 0.7971 ðAE0.1785Þ to 0.4907 ðAE0.1263Þ.These results clearly demonstrate the effectiveness of our model in reducing artifact contrast across different types of lesions.

Fig. 7
Fig. 7 Comparison of artifact contrasts between the input reconstruction and the output of the model.(a) Artifact contrast for all cases.(b) Artifact contrast for the benign group.(c) Artifact contrast for the malignant group.

Fig. 8
Fig.8Contrast of hemoglobin between the second and first layers, as well as between the third and first layers.

Fig. 10
Fig. 10 Maximum hemoglobin levels in benign and malignant groups, categorized by input reconstruction and APU-Net output.

Fig. 9
Fig. 9 Depth contrast subgroup study.(a) Second-layer contrast for the benign group.(b) Secondlayer contrast for the malignant group.(c) Third-layer contrast for the malignant group.

Figure 12
Figure 12 illustrates the contrasts of deeper layers versus the first lesion layer.Removing the attention modules from the model resulted in a second-layer contrast of 1.1169 AE 0.3998 and a third-layer contrast of 0.8976 AE 0.1826.While the model without the attention module showed a similar mean contrast compared with the current model, it also demonstrated larger variances either in the second or the third layer.In addition, the exclusion of the attention module during training led to ∼32% and 23% reductions in hemoglobin value for benign and malignant cases, respectively, emphasizing the critical role of attention modules in enhancing the model's spatial distribution awareness and, consequently, preserving accurate values in the enhanced images.

Table 2
Patient information.

Table 1
Range of parameters used in simulations and phantoms.